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Abstract 

We study a general framework for broadcast gossip algorithms which use companion variables 
to solve the average consensus problem. Each node maintains an initial state and a companion 
variable. Iterative updates are performed asynchronously whereby one random node broadcasts 
its current state and companion variable and all other nodes receiving the broadcast update their 
state and companion variable. We provide conditions under which this scheme is guaranteed 
to converge to a consensus solution, where all nodes have the same limiting values, on any 
strongly connected directed graph. Under stronger conditions, which are reasonable when the 
underlying communication graph is undirected, we guarantee that the consensus value is equal 
to the average, both in expectation and in the mean-squared sense. Our analysis uses tools 
from non-negative matrix theory and perturbation theory. The perturbation results rely on 
a parameter being sufficiently small. We characterize the allowable upper bound as well as 
the optimal setting for the perturbation parameter as a function of the network topology, and 
this allows us to characterize the worst-case rate of convergence. Simulations illustrate that, in 
comparison to existing broadcast gossip algorithms, the approaches proposed in this paper have 
the advantage that they simultaneously can be guaranteed to converge to the average consensus 
and they converge in a small number of broadcasts. 

1 Introduction 

Gossip algorithms are an attractive solution for information processing in applications such as 
distributed signal processing [Tj, networked control [2], and multi-robot systems [3j. They are 
attractive because they require little infrastructure; nodes iteratively pass messages with their 
immediate neighbors in a network until they reach a consensus on the solution. Consequently, 
there is little overhead associated with forming and maintaining specialized routes, and there are 
no bottlenecks or single points of failure. 

Broadcast gossip algorithms, introduced in (4H6|, are especially attractive for use in wireless net- 
works. Unlike the majority of existing gossip algorithms, where messages are either asynchronously 
exchanged between pairs of nodes or where nodes synchronously exchange and process messages 
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with all of their neighbors, in broadcast gossip algorithms nodes asynchronously broadcast a mes- 
sage and the message contents are immediately processed by all neighbors receiving it. By exploit- 
ing the broadcast nature of wireless communications, broadcast gossip algorithms are more efficient 
(they converge after fewer transmissions) than other gossip algorithms [6j. However, previously pro- 
posed broadcast gossip algorithms either converge to a consensus on a random solution [fj], which 
may not be acceptable in practical applications, or they do not have theoretical guarantees [7|. 

In this article we propose and analyze a family of broadcast gossip algorithms for strongly 
connected directed graphs. If the network is symmetric (undirected) or if nodes know their out- 
degree, these algorithms are guaranteed to converge to the average consensus both in expectation 
and in the mean-squared sense. In more general settings, the algorithms are still guaranteed to 
converge to a specific solution which is a convex combination of the initial values at all nodes in 
the network (but not necessarily the average) . We give a precise characterization of this solution in 
terms of the algorithm parameters. Our analysis combines tools and techniques from non- negative 
matrix theory and matrix perturbation theory. Along these lines, we derive an upper bound on 
the perturbation parameter under which convergence is guaranteed, and we derive an expression 
for the optimal value of the perturbation parameter. 



1.1 Related Work 

Broadcast gossip algorithms (BGAs) are introduced in the series of papers by Aysal et al. [4j|6]. 
The BGAs proposed there involve nodes asynchronously transmitting a scalar-valued message, and 
each time a node receives a message from its neighbors it performs an update by forming a convex 
combination of the received value with their own previous value. Then, when it is a given node's 
turn to broadcast next (as determined by a random timer, in the asynchronous model [8j[9]), the 
node broadcasts its current value. In [IJjfj] it is shown that, when executed over an undirected 
graph (i.e., one with symmetric links) such an algorithm converges to a consensus solution almost 
surely. The updates of this algorithm are linear and can be expressed as a random, time-varying 
matrix acting on the vector containing the state values at each node. Unlike conventional pairwise or 
synchronous gossip algorithms, the matrices in [4j|6] corresponding to the update when a particular 
node transmits cannot be viewed as the transition matrix of a reversible Markov chain, and so the 
average is not preserved from iteration to iteration. Consequently, although the consensus value is 
equal to the average of the initial values at every node in expectation, for any particular sample 
path (where the randomness is in the sequence determining the order in which nodes broadcast) 
the consensus value is randomly distributed about the average of their initial values but is not 
precisely equal to it. 



Subsequent recent work 10 investigates related BGAs, demonstrating that their convergence 
properties are robust even when the broadcasts from different nodes may interfere at a receiver. 
A broadcast-based algorithm has also been proposed for solving distributed convex optimization 
problems |11| . 

A modified BGA is proposed by Franceschelli et al. (7j[T2] , where nodes maintain a companion 
(or surplus) variable in addition to the state variable they seek to average. By careful accounting 
of both the companion and state variables, a conservation principle is established, and simulation 
results suggest that the algorithm with companion variables converges to the average consensus 
for all sample paths, not just in expectation. However, no proof of convergence or theoretical 
convergence rate analysis is available for the algorithm of [7 12 



Recent work of Cai and Ishii [13 14 analyzes related distributed averaging algorithms on di- 
rected graphs that use companion variables. The two types of algorithms analyzed in |13[|14| 
involve asynchronous pairwise updates and synchronous updates. They make use of tools from 
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matrix perturbation theory, and the work in the present article can be seen as generalizing the 
results in 13,14 for broadcast gossip updates. 



1.2 Contributions and Paper Organization 

The contributions of this article are as follows. In Section [2] we propose a general framework for 
broadcast gossip algorithms over directed graphs using companion variables. For this framework 
we determine conditions on the algorithm parameters under which convergence to a consensus is 
guaranteed both in expectation (in Section [3| and in the mean squared sense (in Section [4]). 

We then consider two specific instances of the general framework in Sections [5] and [6j In one 
instance, which we refer to as unbiased broadcast gossip algorithms (cf. Section [5J, the consensus 
value is guaranteed to be the average of the initial values. In the other instance (biased broadcast 
gossip algorithms, Sec. [6]), the consensus value is no longer the average of the initial values, but it 
depends on the stationary distribution of a Markov chain associated with the algorithm parameters. 
The unbiased algorithm requires that each node be aware of its out-degree, the number of nodes 
that receive its broadcasts. This is a reasonable assumption in networks where connectivity is 
symmetric, but it may not be reasonable in networks with directed edges. In particular, if there are 
directed edges, then there is no immediate feedback link, making it more challenging for a node to 
identify the out-neighbors that receive its broadcasts. This motivates further study of the biased 
BGAs, which are more practical in such scenarios because they do not require that nodes know 
their out-degree. 

Our analysis of the general framework makes use of tools from matrix perturbation theory. In 
particular, the way in which the information in the companion variables is incorporated back into the 
main state variables depends on a parameter which can be viewed as controlling the extent to which 
a baseline linear system is perturbed. For sufficiently small values of the perturbation parameter, 
the algorithm is guaranteed to converge. In Section [7] we determine a tight upper bound on the 
allowable values for the perturbation parameter for biased broadcast gossip algorithms. This bound 
constitutes an improvement over previous bounds along these lines because it explicitly takes into 
account the structure of the graph through spectral properties of a corresponding graph Laplacian 
matrix. In addition to determining this bound, we identify a topology-dependent optimal value 
for the perturbation parameter in Section [8| and we obtain an expression for the resulting second 
largest eigenvalue which governs the worst-case rate of convergence. 

Simulation results, reported in Section [9j demonstrate that the proposed broadcast gossip algo- 
rithms fare well compared to the existing algorithms [6j[7]. The algorithm of [6] converges quickly 
but can converge to a consensus value which is very far from the average. The algorithm of (7j 
converges to the average consensus but requires significantly more iterations than the algorithm 
of |6|. The algorithms proposed here converge quickly and they can be made to converge to the 



average consensus. We conclude in Section 10 



1.3 Notation 

Before proceeding, we summarize some of the notation used in this article. Let x G M n be a n- 
dimensional column vector. The Euclidean norm of x is denoted by ||x||2- Let A be a n x n matrix 
with real-valued entries. Let [A]ij denote the entry in the ith row and jth column of A; we also 
write A{j when there is no ambiguity. The oo-norm of A is given by ||^4||oo = maxj ^]" =1 1-Aijl, 
the largest absolute row sum, and the 1-norm of A is give by ||A||i = max^ Y17=l l-^iili ^ ne Largest 
absolute column sum. The spectral radius of A is the largest modulus of an eigenvalue of A and 
is denoted by p(A) — maxj = i j jri |Aj(.A)|, where Ai(^4), . . . , A n (j4) are the eigenvalues of A. For the 
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vector x G W 1 , let diag(x) denote a n x n diagonal matrix with [diag^)]^ = X{. For a n x n matrix 
A, let diag(A) denote a n-dimensional column vector with [diag(A)]j = Aii. 



2 Framework for Broadcast Gossip Algorithms 
2.1 Network Model 

Let Q = (V, £) be a directed graph which represents the network connectivity, where V = {1, . . . , n} 
is the set of nodes and £ C V x V is the set of directed edges. The network contains a directed edge 
(i, j) G £ if and only if node i receives messages transmitted by node j. Let M,^ = {j G V : (i, j) G £} 
and A/^~ = {k £ V : (k,i) £ £} denote the set of in-neighbors and out- neighbors, respectively, of 
node i. For the rest of this paper we make the following assumption. 

Assumption 1. The graph Q is strongly connected; i.e., for any pair of nodes i,j G V, there exists 
a sequence of nodes i = io, ii, 12, ■ ■ ■ , i m = j such that (£e_i, ii) G £ for all £ = 1, . . . , m. 



2.2 Distributed Averaging 

The goal of any broadcast gossip algorithm is to accomplish distributed averaging. Each node i G V 
initially has a value, Xj(0) G M., and the goal is for all nodes to compute the average, - Y^=i x i{®)- 
In general, distributed averaging algorithms seek to achieve consensus on the average while only 
allowing messages to be passed between neighboring nodes, as defined by the communication graph 
Q. In broadcast gossip algorithms we make the following additional restrictions. Each node has a 
unique id, which corresponds to its index in the set {1, . . . , n}. When a node transmits a message, 
the message is received by all of its out-neighbors. The receiving nodes may know the id of the 
transmitter, but the transmitter will not, in general, know the ids of the receivers. Equivalently, 
each node knows the ids of its in-neighbors but not its out-neighbors. 



2.3 Asynchronous Time Model 

Following [6], we adopt the standard asynchronous time model (9l. Each node runs a clock which 
ticks according to an independent rate 1/n Poisson process. When node k's clock ticks it initiates 
a broadcast gossip update, the details of which are described in the subsection that follows. Since 
the clocks at each node are independent, this model is equivalent to running a single, global Poisson 
clock which ticks at rate 1, and assigning each tick uniformly and independently to one node in V. 
In the sequel we use the variable t G {1, 2, . . . } to index the ticks of this global Poisson clock. Each 
global clock tick corresponds to one update or iteration. 



2.4 Broadcast Gossip Updates 

Similar to previous broadcast gossip algorithms with companion variables [7, 13 , every node i 
maintains two variables, Xi(t) and yi(t). The first variable, Xi(t), is the estimate of the average at 
node i after t iterations, and it is initialized to Xj(0), the same initial value from Sec. 2.2 The second 
variable, yi(t), is the companion variable at node i after t iterations, and it is initialized to yi(0) = 0. 
The companion variables (called "surplus" variables in [14]), play the role of compensating for 
asymmetric updates made to Xi(t), and if they are updated carefully, the companion variables can 
be used to ensure that consensus is achieved on the average. 

When a node's clock ticks, it initiates an update by broadcasting its current state and companion 
value. Suppose that the t + 1st global clock tick occurs at node k. Then k broadcasts the values 
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x k (t) and y k (t), and all nodes j € N k which receive this information set 

Xj(t + 1) = (1 - a j)k )xj(t) + a j>k x k (t) + edf ] yj(t) (1) 
Vj(t + 1) = a jik ( Xj (t) - x k (t)) + (1 - edf ] ) yj {t) + b jjk y k {t), (2) 

the al 

transmitting node k sets 



where the values of the algorithm parameters cij >k , bj tk , dj , and e > will be specified below. The 



and all other nodes i ^ k U M k keep 



x k (t + 1) = x k (t) (3) 
S/ fc (t + l) =0, (4) 



Xi(t + 1) = Xi(t) (5) 
w(t+l)=yi(t). (6) 



Note that the nodes need not be aware of the global clock index t to implement this protocol; they 
can simply update two local registers (one for Xj and one for y{) when they either broadcast a 
message or receive a broadcast. Below we continue to keep track of the global clock index for the 
purposes of analysis. 

(k) 

Different choices of the parameters aj )k , bj tk , d- , and e lead to different broadcast gossip 
algorithms with different properties; we will examine two particular choices of interest in Sections [5] 
and [6| Note that the seminal broadcast gossip algorithm of [6] is recovered by setting e = and 
o>j,k = 7 f° r an (j) fc) € £. The broadcast gossip algorithm of j7| does not directly fit the form 
considered here, since in [7j, the receiving nodes also use y k (t) to calculate Xj(t + 1). 

The broadcast gossip updates Q-Q are linear, and below we will use tools from linear algebra, 
spectral graph theory, and matrix perturbation theory to analyze their convergence properties. To 
this end, we introduce some additional notation. Let A and B be n x n matrices with entries 
[A]ij = aij and [B]ij = bij, respectively, satisfying 

|0<o <J -<l if(t,i)€f 

and 

fo<6 i>i <l if(i,j)e£ 

\by = if (i, j) ^ f . 

The matrices are graph-conformant in the sense that they have non-zero entries in locations corre- 
sponding to the edges of Q. 

We write e k € M n for the feth canonical vector — the vector with all entries equal to except for 
the kth. entry, which is equal to 1. We also write 1 (respectively 0) for a n-dimensional vector with 
all entries equal to 1 (respectively 0). 

Define A k = Ae k e^ and B k = Be k eT. One can verify that B k is a n x n matrix, the kth. column 
of B k is identical to that of B, and all other entries of B k are zero (and similar properties hold for 
A k in relation to A). It also follows directly from the definitions of A k and B k that A = Y2 k ^v 
and B = Y. ke v B k- 

The matrices A and B can be viewed as weighted adjacency matrices of the graph Q (possibly 
assigning different weights to each edge). From this view, the matrices A k and B k correspond to 
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weighted adjacency matrices of a graph Gk obtained from Q by eliminating all edges except those 
of the form (i,k) for some i £ V, i.e., by retaining only those edges emanating from k. Thus, Qk 
represents the graph of active edges when node k transmits. 

Finally, with this view of Ak as a weighted adjacency matrix on Qk, let Lk = diag(^4fcl) — Ak 
denote the corresponding (directed) graph Laplacian. It follows from the definition of L k that 

Lfcl = 0. It also follows from the definition of Ak that Y^kev = diag(^41) — A = f L, where L is 
the Laplacian corresponding to the graph with weighted adjacency matrix A. 

The remaining algorithm parameters to discuss are df ] and e. Let £ [0, If denote a vector 
with values satisfying 



'df ] >0 ]£(j,k)€S 
d« =0 if (J, k) i S, 



and let Dk = diag(d^) denote a diagonal matrix with [Dk\i,i = . The positive weights edh 
determine the amount of j's own surplus it injects into an update of Xj(t + 1) when j receives a 
broadcast from node k. The parameter e > will be treated as a perturbation parameter in our 
analysis below, and through this analysis we will obtain: 1) an upper bound on how large e can 
be made while still ensuring convergence, as well as 2) an indication of how e affects the rate of 
convergence. 

Define the 2n x 2n matrix Wk to be 



(9) 



(k) 



I - L k eD k 
Lk Sk — eDk 



(10) 



where Sk = I — eke k + Bk- The general broadcast gossip updates Q-Q can be compactly written 



as 



W(t) 



x(t) 

y(t) 



(ii) 



x(t + l) 

y{t + 1) 

where W(t) is a random matrix with W(t) = Wk when node k transmits at iteration t. In the 
asynchronous time model, the random matrices W(t), t = 1, . . . , are independent and identically 
distributed, and W(t) = Wk with probability 1/n for all fcsV. 



3 Convergence in Expectation 



(k) 

Next, we focus on identifying properties that the parameters ay, Oy, d- , and e must satisfy 



in order to guarantee that the iterations (11) converge in expectation. Since Lk contains some 



negative entries, Wk is not nonnegative, and so standard results from nonnegative matrix analysis 
and the study of Markov chains are not sufficient to guarantee convergence in expectation. Our 
approach will make use of a combination of techniques from the theory of nonnegative matrices 
and perturbation theory. 

Taking the conditional expectation of ( |11[ ) with respect to the random node that broadcasts at 
each iteration, given the initial values x(0) and y(0), we obtain 



E 



x(t + 1) 

y(t + 1) 



x(0) 
.2/(0). 




x(0) 
.2/(0) 



(12) 
(13) 
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where W = ± Y, k & W k- 0ne can verif y that W k 1 ' J ] 7 = [1/ J ] J for all fe G V since L fc l = 0, 
and so [1 T T ] T is also a right eigenvector of W corresponding to the eigenvalue 1. The main result 
of this section is the following. 

Theorem 1. In addition to the constraints ([7]), ([8]), and (|9j imposed on the algorithm parameters 
above, suppose that ||-B||oo < 1 or ll-^lli — 1- Then under the assumption that Q is strongly connected 
(Assumption^^ , there exists a value ij > such that if e 6 (0, rj], then 1 is a simple eigenvalue ofW 
with corresponding left eigenvector [wf w^ 1 ^ 
and 

y(t) 



normalized such that [wj 



IV 



1 T T F 



wTi 



lim E 

t— >oo 



X(0) 

2/(0) 



T 

1 " 



( x(0)) 1 




(14) 



Remark 1. As a consequence of Theorem [TJ the broadcast gossip updates Q-© will converge 
to the average consensus if and only if w\ = ^1. From the expression for W derived below (see 
eqn. (|15|)), it turns out that this is only possible if W2 = ^1 and 1 T B = 1 T . 

The rest of this section is devoted to the proof of Theorem [TJ 



3.1 Preliminaries and The Plan 



From (10), we find that the expected update matrix W has the form 



W 



'I -L 0" 


+e 


"0 


D ' 


L S_ 





—D 



(15) 



■W 



where, recalling that L 
matrix A, we have 



diag(^41) — A is the Laplacian of the graph with weighted adjacency 



L 



n 

feev 



-L 



n 



(16) 



D 



S 



n 



fcev 



diag £ d^ 



\kev 



n L — ' V n / 

fcev v 7 



I+-B. 

n 



(17) 



(18) 



Using the expression (15) for W, one can verify the statement made in Remark [l] above. 

From (15), it is evident that W, can be viewed as a perturbed version of the matrix Wo- The 
proof of Theorem [T] involves first characterizing the eigenvalues of Wo using concepts from the 
theory of nonnegative matrices. Then results from perturbation theory can be used to determine 
the eigenvalues of W as a function of e and the eigenvalues of Wo and E. Before proceeding, we 
briefly review background material from nonnegative matrix theory and perturbation theory. 



3.2 Background 

Recall that a matrix F is called nonnegative if all of its entries are greater than or equal to zero. A 
square nonnegative matrix F is primitive if there exists a positive integer k such that all entries of 
F k are strictly positive. If F corresponds to the weighted adjacency matrix of a strongly connected 
graph, then it is irreducible and thus primitive (151 . 



Next we recall some definitions and results from perturbation theory 16 . 
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Lemma 1 ( | 16| Sec. 2.4). Suppose that a matrix F(e) is continuously differentiable (entry-wise) 
with respect to the perturbation parameter e. Then the eigenvalues of F(e) are continuous functions 
ofe. 

An eigenvalue of F(e) is called stable if it does not depend on e. An eigenvalue of F(e) is called 
semi-simple if its algebraic multiplicity is equal to its geometric multiplicity. 

Lemma 2 ( [16] Sec. 2.8). Suppose that F(e) = Fq + eE. Let Xq be a semi-simple double eigenvalue 
of Wq with corresponding left eigenvectors u\ and U2 and right eigenvectors v\ and V2 normalized 
such that ufv\ = 1 and u^V2 = 1- Then, for e > 0, the eigenvalue Xq bifurcates into two distinct 
eigenvalues Ao,i(e) and Ao,2(e) of F(e). The bifurcation is given in the form of the power series 

A o ,i(<0 = Ao + eA' + o(e) (19) 
A , 2 (e) = A + eA" + o(e), (20) 



where X' and X" are the eigenvalues of the 2x2 matrix, 



ujEvi ufEv2 



u\Ev\ u\Ev2 



(21) 



3.3 Eigenvalues of Wq 

In order to apply the perturbation results mentioned above, we need to first identify the eigenvalues 



of Wq. Observe, from ( 15 ), that Wq is block diagonal, and so the eigenvalues of Wq are the collective 



eigenvalues of I — L and S. 

Lemma 3. The matrix I — L is primitive, its largest eigenvalue is 1, and all other eigenvalues of 
I — L have moduli strictly less than 1 . 

Proof. Recall that a^j G [0, 1] according to the constraints 0, and so I — L is a square non- negative 
matrix. Moreover, all diagonal elements of I — L are strictly positive. Since Q is strongly connected, 
it follows that I — L is irreducible, and combining these facts gives that I — L is primitive. Since 
/ — L is primitive and (I — L)l = 1, we have p(I — L) > 1. Also, its spectral radius is bounded by 
p(I — L) < || J — -L||oo = 1. Thus its largest eigenvalue is 1 and, according to the Perron- Frobenius 
Theorem [17], all other eigenvalues of / — L are strictly less than 1. □ 

Based on Lemma [3] we know that Wq has at least one eigenvalue equal to 1. Next we need to 
determine the eigenvalues of S = (1 — -)/ + \B. If Xi(B) is an eigenvalue of B, then Xi(S) = 
1 — i + ^Xi(B) is an eigenvalue of S, and so the real task is to characterize the eigenvalues of B. If 
all eigenvalues of B have magnitude less than 1, then all eigenvalues of S are also less than 1, and 
so 1 is a simple eigenvalue of Wq. On the other hand, if 1 is an eigenvalue of B then it is also an 
eigenvalue of S, in which case 1 is a multiple eigenvalue of Wq. 

Under the assumptions of Theorem [T] we have that ||-B||oo < 1 or ||-B||i < 1. Since p(B) < 
minlll-BHoo, ||£?||i}, it follows that the largest eigenvalue of B is no larger than 1. Moreover, it 
follows from Assumption 1 and ^ that B corresponds to the weighted adjacency matrix of a 
strongly connected digraph, and hence B is primitive. Thus, S = (1 — + -B is also primitive 
and its diagonal entries are all positive. Then, from the Perron-Frobenius Theorem, the largest 
eigenvalue X±(B) of B is simple and all other eigenvalues of B have magnitude strictly less than 

Ai ;/>•)• 

It turns out that under the condition ||-B||oo < 1 or ||-B||i < 1 there are two possible cases: 
either Ai(5) < 1 or Ai(5) = 1. These cases are captured by the two following lemmas. 



S 



Lemma 4. Suppose that B is either row stochastic, column stochastic, or doubly stochastic. Then 
1 is a simple eigenvalue of S and all other eigenvalues of S have moduli strictly less than 1. 



Lemma [4] follows from standard arguments in the theory of nonnegative matrices [15, 17 
Lemma 5. Suppose that either 



max 

j 



X>,i<i 



8=1 



or 



max bij 
i=i 



< 1 



and 



and 



mm 

j 



i=i 



min bij < 1. 



(22) 



(23) 



TTien i/te modulus of the largest eigenvalue of S is strictly less than 1 . 



Proof. We will prove the lemma for the case where (22) holds; the proof for the other case follows 

^ — 

by a similar argument where S is replaced with S . If max./ Y^a=i < 1 then the claim follows 

since 

p(S) < ||S||i = 1 h - max > ' bi j < 1. (24) 

n n j zL --' 



n 

£' 

i=l 



If 



mm 

j 



in^ftij 



< max 



(25) 



i=l 



i=l 



then suppose, to arrive at a contradiction, that p(S) = 1. Let n denote the right eigenvector of S 
for which Su = u, and let u be normalized such that l T u = 1. Then i?n = u also, and summing 
over the entries of u we find that 



n 

E 



U; 




j=l \i=l 



< Z U J> 



(26) 
(27) 
(28) 



where the inequality follows from the assumption (25). Since this is a contradiction, it must be 
true that all eigenvalues of S are strictly less than 1 when either (22) or (23) holds. □ 

To summarize, in this subsection we have shown that 1 is an eigenvalue of Wo with multiplicity 
at least 1. If B is sub-stochastic (i.e., if elements of some rows and columns sum to a value less 
than 1), then 1 is a simple eigenvalue of Wq. On the other hand if B is a (row-, column-, or doubly) 
stochastic matrix, then 1 is an eigenvalue of Wq with multiplicity 2. In the next subsection we 
apply tools from perturbation theory to characterize the eigenvalues of W. 
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3.4 Perturbation Analysis 



The proof that the broadcast gossip iterations converge in expectation hinges on showing that 1 is a 
simple eigenvalue of W with appropriate corresponding eigenvectors. When 1 is a simple eigenvalue 
of W then the analysis is straightforward. 



Proposition 1. Suppose that either of the conditions (22) or (23) of Lemma^hold. Then there 
exists a number r] > such that if e G (0, rj] then 1 is a simple eigenvalue of the matrix W = Wo+eE 



in (15) and the moduli of all other eigenvalues ofW are strictly less than 1. 



Proof. From the discussion in Section 



3.3 



we know that the eigenvalues of W are the collective 
eigenvalues of Wo and S. Under the conditions of the proposition, Lemmas [3] and [5] provide that 1 
is a simple eigenvalue of Wo and the moduli of all other eigenvalues of Wo are strictly less than 1. 
Observe that W = Wo + cE depends on e in a continuous manner, and so the eigenvalues of W are 
continuous functions of the perturbation parameter e. Therefore, there exists rj > so that 1 is a 
simple eigenvalue of W and the moduli of all other eigenvalues of W are strictly less than 1 when 
ee(0,rj\. □ 

When 1 is a double eigenvalue of W we arrive at the same conclusion, but the proof requires a 
bit more effort. 

Proposition 2. Suppose that B is either row stochastic, column stochastic, or doubly stochastic. 
Then there exists a number rj > such that if e G (0, 77] then 1 is a simple eigenvalue of the matrix 
W and the moduli of all other eigenvalues ofW are strictly less than 1. 



Proof. The proof follows from a generalization of an argument in 14 . Under the conditions of 
the proposition, Lemmas [3] and [4] provide that 1 is an eigenvalue of Wo with multiplicity 2. One 
can verify that 1 is a semi-simple eigenvalue of Wo since there exist two linearly independent right 
eigenvectors U\ and u 2 , with corresponding linearly independent left eigenvectors v\ and v 2 . These 
eigenvectors are given by 



(29) 










1 




w 




p 


Ui = 




, u 2 = 




, Vl = 


w 




















where w is the eigenvector of S satisfying w T S = w T and w T l = 1, and where p and q are the left 
and right eigenvectors of I — L corresponding to the eigenvalue 1, normalized so that p T l = 1 and 
l T q = 1. Note that these eigenvectors exist as a consequence of Lemmas [3] and |4| One can verify 
that 



T T 
Vi U\ = V 2 U 2 



1. 



T 
Vi U 2 



T 
V 2 Ul 



0. 



(30) 



Also note that all entries of the vectors w, p, and q are positive since I — L and S are primitive 
matrices. 

Applying Lemma [2] with these values, we find that the semi-simple eigenvalue 1 bifurcates into 
two eigenvalues 



Ai,i(e) = l + eA' + o(e) 
A li2 (e) = l + eA" + (e), 

where A' and A" are the eigenvalues of the matrix 





vfEui v\Eu 2 
v\Eui v 2 t Eu 2 



-^r-p 1 Dq ^-p 1 Dq 



(31) 
(32) 



(33) 
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Thus, we have A' = 0. Also, since p, q, w, and the diagonal entries of D are all strictly positive, 
we have A" = -^-p T Dq < 0. It follows that Ai,i = 1 is the stable eigenvalue of W corresponding 
to the right eigenvector [1 T T ] T . Moreover, for sufficiently small e, we have Ai2(e) < 1 since 
dXipic) /dc\e=o = X" < 0. Therefore, there must exist a positive constant rji so that | Ai ; 2 (e) | < 
|Ai ; i(e)| when e G (0,771]. In addition, the eigenvalues of W are continuous functions of e, so the 
moduli of Ai j i(e) = Xi(W) and Ai,2(e) = A2(W) will dominate the moduli of all other eigenvalues of 
W provided that e > is sufficiently small; i.e., there exists an 772 > such that max^i^ |Aj(PF)| < 1 
when e G (0,772]. Therefore, when e G (0, min{?7i, 772}], then 1 is a simple eigenvalue of W and the 
moduli of all other eigenvalues are strictly less than 1. □ 

We are now ready to complete the proof of Theorem [T] 



3.5 Proof of Theorem Q] 



Suppose that 1 is a simple eigenvalue of a matrix W and the moduli of all other eigenvalues are 
strictly less than 1. Let u and v denote the left and right ei genv ectors of W corresponding to the 
eigenvalue 1, normalized so that u T 



1. Then it is known 



14 



18 



that lim 



t— >co 



w 



uv 



From Propositions[T]and[2]we know that, under the conditions of Theorem[TJ 1 is a simple eigen- 
value of W with corresponding right eigenvector [1 T T ] T , and the moduli of all other eigenvalues 
of W are strictly less than 1. Let [wj W2V be the left eigenvector of W satisfying 



normalized such that 
converge to a limit 



1 T 



v$]W=[v$v$], (34) 
= 1. Then the expected broadcast gossip updates (fl3]) 



lim E 

t— >oo 



x(t) 

y(t) 



x(0) 

y(0) 



lim W 

t— >co 



x{0) 
2/(0) 



1 




~x(0) 










{wfx(0)) 



(35) 

(36) 
(37) 



where the last line follows since 7/(0) = 0. This completes the proof of Theorem [TJ 



4 Convergence in the Second Moment 



The previous section dealt with convergence in expectation. Next we present a general condition 
for convergence in the second moment of the broadcast gossip algorithm described in Section [2] 

Theorem 2. Suppose that B is a (row-, column-, or doubly) stochastic matrix. Let v G W 1 be the 



vector satisfying v T B = v T normalized such that v T l = 1. The sequence of vectors {x(t),y(t)} 



generated by the broadcast gossip updates Q-Q satisfy 

(v T x{0))l 



00 
t=i 



lim E 

t— >oo 



X(t) 

y(t) 



x(0) 
1/(0) 



if and only if 



p E[W(t)®W(t)] 



1 




1 




V 




V 










)( 


V 




V 



< 1. 



(38) 



(39) 
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Remark 2. Theorem[2]can be viewed as generalizing the convergence conditions for linear iterations 
described in jl8|[l9] to update matrices which have the form (10). 



Proof. We first prove that (39) implies (38). Let z(t) = [x(t) T y(t) T ] T and define the error vector 
m(t) = z(t) — Jz(0), where 



J 



v T v T ] 



(40) 



Observe that W^J = J for all k 6 V, since L^l = 0, and therefore m(t + 1) = W(t)m(t). Let 
M{t) = m(t)m(t) T . Then M(t+1) = W(t)M(t)W(t) T . Construct a vector m(t) G M 4 ™ 2 by stacking 
the elements of M(t) column-wise, and observe that 



t-i 



E[m(t)\m(0)] = Y[E[W(s) W{s)} m(0) 



s=0 



E[W(1) ® W(l)]* m(0), 



(41) 
(42) 



since the matrices W(t) are independent and identically distributed. 

Under the assumption that t^-B = one can verify that [v T v T ]E\W (t)] = [v T v T \. Note that 
such a vector exists since B is primitive*] and either row or column stochastic (by assumption). 
We have also seen that E[W(i)][l T T p = [1 T T ] T . It follows that ([v T v T ] T <g> [v T v T \ T ) and 
([1 T T ] T <8) [1 T T ] T ) are the left and right eigenvectors of E[W(i) (8> W(i)] corresponding to the 
eigenvalue 1. If assumption (39) holds, then we have 



lim E[m(t)\m(0)\ 

t— >oo 



1 




1 




V 




u 










)( 


V 


8) 


u 



m(0) 



0. 



(43) 
(44) 



where the last equality follows because v is orthogonal to m(0) and hence the vector ([v v 



T „,TlT 



is orthogonal to m(0). 



Observe that (Hi]) implies that E[mj(t) 2 ] 

E[m(t) T m(t)\m(0)] = J^E[mj(t) 2 ] -)• 



for all i, and therefore 

2n 



(45) 



i=l 



which gives us ( 38 ) . 



Next, to see that (38) implies (39), observe that if (38) holds then it must be true that 
E[rrii(t) 2 ] — > for all i = 1, . . . , In. By the Cauchy-Schwarz inequality, we have 



E[mi{t)mj{t)} 2 < E[m;(t) 2 ] • E[mj(tf 



0. 



(46) 



Therefore, each entry in the matrix E[m(t)m(t) T ] tends to as t — > oo, independent of m(0), which 



implies that (39) must hold 



□ 



5 Unbiased Broadcast Gossip 



This section proposes a particular choice of values for the parameters aj^, bj^, and dj, corre- 
sponding to a particular family of broadcast gossip algorithms. For the choice considered in this 



*To see why, recall that Q is strongly connected (Assumption 1). 
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section, we guarantee that the broadcast gossip updates Q-© converge to the average consensus. 
For this reason we refer to these as unbiased broadcast gossip algorithms (UBGA). 

Recall that A/j~ denotes the set of out-neighbors of node j and A/j 1 " denotes the in-neighbors of 
node j. Let |jV| denote the cardinality of the set Af. Let Sj = \Af7~\ denote the out-degree of node 



j, and let 5J 



\Aff\ denote its in-degree. 



Unbiased broadcast gossip algorithms are obtained by setting 



4" 



B 



i/s} 


if j G Af k 





otherwise 









otherwise 



(47) 



(48) 



and taking Aj^ to be any values which satisfy the constraints ([7]). In order to implement such a 
protocol, each node j E Af k that receives messages from k needs to know 67, the out-degree of 
the broadcasting node k. If k knows its out-degree (i.e., the number of neighbors that receive its 
broadcasts) then this can be accomplished by having k broadcast the value of 57 to all nodes in 



Af k . If Q is undirected^ as is assumed in pj, then 5 k 



St, and so it is reasonable for k to know 



its out-degree. On the other hand, in a general directed graph it may be difficult or impractical 
for k to know its out degree since k may not receive messages directly from all nodes j E AfZ ■ In 
Section [6] below we describe and analyze an alternative algorithm which does not require knowledge 
of 67, but for which we are not guaranteed to achieve consensus on the average. First we discuss 
theoretical convergence guarantees for UBGA. 

First, note that for e sufficiently small, UBGA asymptotically converges in expectation to the 



average consensus in the sense that E[x(i)] 



that for the choice of parameters given in (48) we have 1 T B 
also Remark 1 



41 i(0) and E[y(i)] — > 0. To see why, observe 



Therefore, by Theorem [l] (see 
there exists an rj > such that UBGA converges in expectation to the average 
consensus for e E (0,r)]. It turns out that UBGA also converges to the average consensus solution 
in the mean-squared sense. 



(k) A 1 j 1 | 

Proposition 3. Let the parameters d- and Bjk be chosen as in (47) and (48), and take v = — 1 



Then there exists a constant rj > such that if e E (0, rf\ then (39) holds, and so 



lim E 

t— >oo 





~x(t)~ 




\l (i T -(o)) i" 




2 




~x(0)~ 




yip). 









2 







0. 



(49) 



Proof. To prove the theorem, we need to show that there exists an rj > such that (39) holds. 
Convergence in the second moment then follows from Theorem [2] 

To show that ( |39| ) holds for sufficiently small e, we use a perturbation argument similar to one 
Since (J 



m 
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E[Lfc])l = 1, the matrix I — E[Lfe] has 1 as an eigenvalue with right eigenvector 
1. It follows that there exists a corresponding left eigenvector w satisfying w T (I — E[Lk\) = w T 
and w T l = 1, and all entries of w are positive. Consequently, the following four equalities hold: 



E[(J - L k ) 0(1- L k )\ (1 ® 1) = (1 ® 1) 
(w (8) 1) T E[(J - L k ) (8) S k ] = (w® if 
(1 ® w) T E[S k (8) (I - L k )} = (10 w) T 



(1 



E[,S fc ®5 fc ] = (1 



(50) 
(51) 
(52) 
(53) 



'I.e., (i,j) G £ if and only if (j,i) € £ 
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Thus, the four matrices E[(J - L k ) ® (I - L fc )], E[(J - L fc ) S k ], E[S k (I - L fc )], and E[5 fc S fc ] 
have eigenvalue 1. These matrices are all non- negative and irreducible under the constraints M 
and since Q is strongly connected. Moreover, since the corresponding eigenvectors (1(8)1), (w0l) , 



(1 <g> , and (l 



are all positive, it follows from the Perron-Frobenius Theorem that 1 is 



the largest eigenvalue of the four matrices above, and all other eigenvalues have moduli strictly less 
than 1. 

In order to apply a perturbation argument to analyze the eigenvalues of E[W(i) W(i)], let us 
write W k = M k + eF k with 



M k 
F k 



I-L k 
L k S k 

D k 
-D k 



(54) 
(55) 



Similarly, we have W(t) = M(t) + eF(t), with M(t) and F(t) being random matrices drawn from 
the collection {(M k , F k )} depending on which node k broadcasts at iteration t. In the following we 
omit the dependence on t to simplify notation. With the above definitions, we have E[W W] = 
E[M M] + eE[M (g> F + F M + F (8) eF]. There is a 2n x 2n permutation matrix P such that 
P T E[W 117' \1 ■ (F where 



M = E 



■(/ - L) (8 (J - L) 

(I-L)®L (J - L) (8 
L®(I-L) 
L®I L (8 5 



F = E 



(I-L)®D D®(I-L) 

-(I-L)(g)D D®L 

L® D -D®(I-L) 

-L®D -D(g)L 




5 

S®(I-L) 

5 L 5 5 

£><8eL> 
D (5 - eL>) 
(5 -eD)®D 
-S®D-D®S + D®eD 



(56) 



(57) 



Since 1 is a simple eigenvalue of each of the matrices E[(J— L)®{I— L)], E[(J— L)05], E[50(J— L)], 
and E[50 5], and all other eigenvalues of these matrices have moduli less than 1, we find that 1 is 
a semi-simple eigenvalue of M with multiplicity 4, and all other eigenvalues of M are strictly less 
than 1. 

We will use a perturbation argument to show that for sufficiently small e > 0, the largest 
eigenvalue of M + eF is 1 and 1 is a simple eigenvalue. Our argument is based on a generalization 
of Lemma [2] that addresses bifurcation of a quadruple semi-simple eigenvalues rather than double 



semi-simple eigenvalues 16 

Let Aj(e), i = 1, . . . , 4 denote the four bifurcating eigenvalues of M + eF. Similar to Lemma[2 
we have A,(e) = 1 + e£i + o(e), where £j, i = 1, . . . , 4 are four eigenvalues of the matrix of similar 
structure to ( |2T| ). Solving for . . ,£4, we find that the derivatives of the eigenvalues A»(e) with 
respect to e are given by 



d\i(e) 

de 
d\ 2 (e) 

de 
d\ A (e) 

de 





dA 3 (e) 



de 

-2NvjE[D]v 2 < 



NvlE[D]v 2 < 



(58) 
(59) 
(60) 
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where v\ is the positive left eigenvector of I — L normalized such that v T l = 1, and vi is the 
positive right eigenvector of S normalized so that 1 T V2 = 1. Similar as in Lemma §(see (l6|), it 



follows that there exists a real number rj > such that the matrix E[W (g) W] has only one simple 



eigenvalue 1, and the moduli of all other eigenvalues are smaller than 1 when e € (0,77]. Thus, (39) 



holds, and convergence in the second moment follows from Theorem [2j □ 

6 Biased Broadcast Gossip 

The previous section proposed UBGA, a broadcast gossip algorithm which provably converges to 
the average consensus in both expectation and in the mean-squared sense. UBGA is practical in 
situations when the network can be guaranteed to be undirected, or when nodes otherwise know 
their out-degree. For instance, one could enforce that only symmetric links are used by having 
each node broadcast its set of in-neighbors and then only updating using messages from neighbors 
for which the neighborhood relationship is symmetric. However, this may be undesirable in some 
applications, and so in this section we consider an alternative family of broadcast gossip algorithms. 
These algorithms are no longer guaranteed to converge to an average consensus, and so we refer to 
them as biased broadcast gossip algorithms (BBGAs). However, we still guarantee convergence in 
expectation and in the mean-squared sense to a characterizable value which depends on the initial 
state at each node and the structure of the network. 

Biased broadcast gossip algorithms are obtained by setting 

#>-M i{jeU * (61) 
I otherwise 

Bjk = l 1/S t if -? eA/ T (62) 
I otherwise, 

and taking Aj^ to be any values which satisfy the constraints 0. To implement such a scheme we 
only require that each node has knowledge of its in-neighbors, which is reasonable in the broadcast 
setting. 

Observe that, for the choice of parameters just specified, both / — L k and S are row-stochastic 
matrices. Let v be such that v T B = v T and v T l = 1. Thus, the entries of v satisfy v k = 
J2jeJ\f7 v i/^j an d an entries of v are positive. Such an eigenvector exists since B is also row- 
stochastic. One can verify that v 7 Sf. = v T also holds, and so v T S = v T . In general, we do not 
have v = -1 unless 5^ = 5 J for all j and 5^ = 5^ for all i j. Therefore convergence to the 
average consensus can no longer be guaranteed in general. However, we still obtain convergence 
in expectation to a (non-average) consensus, via Theorem [TJ and we can also show that BBGA 
converges in the second moment. 



(k) 

Proposition 4. Let the parameters d) and B-^ be chosen as in (61) and (62). There exists 

J ' " 



rj > such that if e G (0, 77] then (39) holds and so (38) also holds with v being the vector such that 
v T B = v T and v T l = 1. 

Proof. To prove the claim we show that (39) holds and then invoke Theorem [2] We use an argument 
similar to that used in the proof of Proposition [3J Since (J — Sfcl = 1, and v T = v T , the 
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following four equalities hold: 



E[(I-L fc )®(I-L fe )](l®l) = (l®l) (63) 

E[(J-L fc )®5 fc ](l®l) = (l®l) (64) 

E[5 fc ®(J-L fc )](l®l) = (l®l) (65) 

(f T ® v T )E[5 fc ® = (t; T ®v T ). (66) 

Since the eigenvectors above are all positive, it follows that 1 is the largest eigenvalue of each of 
the four matrices E[(J - L k ) ® (I — L fe Yl, E[(J - ® E[S k ® (J — L fc )], and E[5 A ® 

Similar to the proof of Proposition |3j we find that M has largest eigenvalue 1 with multiplicity 
4, and all the moduli of all other eigenvalues are strictly smaller than 1, Let Ai(e), . . . , A4(e) denote 
the four corresponding eigenvalues of E[W ® W]. In this case, we can solve for the eigenvalues and 
again take their derivatives to find that 



rfAi(e) 
de 

d%{e) = d\ 3 (e) 

de de 
d\i(e) n rp 



(67) 
-vjE[D]l < (68) 



de 



2^E[D]1<0, (69) 



where V\ satisfies vfK[I — L&] = vj and vfl = 1. Thus, there exists a positive scalar n > 
such that 1 is a simple eigenvalue of E[W ® W] and all other eigenvalues are strictly less than 1 



when e £ (0, rj]. Subsequently, (39) holds, and convergence in the second moment follows from 



Theorem [2j □ 



7 Upper Bound on 77 

So far we have demonstrated that there exist broadcast gossip algorithms of the form described 
in Section [2] which are guaranteed to converge when the parameter e is chosen to be sufficiently 
small. In this section we derive bounds on rj which can be used as practical guidelines for setting 
this parameter. Previous results suggest that, in general, one must take 77 = 0(n _n ), which is 



extremely conservative [14,20,21 . The bounds in this section make use of the specific structure of 
W{t) to obtain tighter, more useful bounds. 

We begin with a simple observation related to the expected BBGA update matrix. 

Lemma 6. For updates using the BBGA parameters and for sufficiently small e > 0, the second 
largest eigenvalue of W is 1 — e/n. 

Proof. One can verify that, for BBGA, W[1 T T ] T = [1 T T ] T and W[1 T - 1 T ] T = (1 - e/n)[l T - 
1 T ] T . Thus, 1 and 1 — e/n are eigenvalues of W. According to Lemma [TJ the eigenvalues of W 
are continuous functions of e. If 1 or 1 — e/n are not eigenvalues stemming from the semi-simple 
double eigenvalue 1 of Wo, then we obtain a contradiction as e — > 0. Therefore, 1 — e/n must be 
the second largest eigenvalue of W for sufficiently small e. □ 

In order to provide a tight characterization of the upper bound n on the perturbation parameter, 
we need to make more specific assumptions about the values of the weights aj^- Previously, we only 



16 



assumed that they satisfy the constraints Q. For the remainder of the article, unless otherwise 
stated, we assume that 

r i <\; if./' a; 

otherwise. 



o>i,k 



(70) 



In this case, for BBGA, we have S = I — L. 

Let & denote the eigenvalues of the graph Laplacian L = diag(-Al) 



A sorted by increasing 



real part 22 , where n is counted with multiplicity; i.e., = Re(£i) < Re(^) < • • • < Re(£ n ) < 2. 
Note that this last inequality holds because p{L) < ||£||oo = 2, and the real part of for any k 
is nonnegative because L is diagonally dominant. We have the following lemma characterizing the 
eigenvalues of W for BBGA. 

Lemma 7. For BBGA, the In eigenvalues of W are 

1 



Afc, 2 



1 



1 



e 
2n 



1 

n 

e 1 
2n n 



/or A; 
/or 



l,...,n, 



1, . . . ,n. 



Proof. Observe that (61) implies D = I for BBGA. The characteristic polynomial of W is 



det(A/ - W) = det 
= det 



(A-l + ^)/+ 1 L 



([(A-l)(A-l + f)/+^ + ^ 2 



(71) 
(72) 

(73) 
(74) 



where the last inequality follows since the matrices / and L, as well as any linear combination of 
them, are commutative. The zeros of the characteristic polynomial of W thus correspond to zero 
eigenvalues of the matrix 



f/ = (A-l) A-l + -/ + 



n 



2(A-1) 



n 



n z 



(75) 



According to the spectral mapping theorem [15], for any eigenvalue £j of the matrix L, the matrix 

+ Therefore, given an eigenvalue 



2(A-1) 



U has a corresponding eigenvalue (A — 1) (A — 1 — f-) + 
\k of W, there must exist an eigenvalue £j of L such that 



(Afc-1) A fc -1 + - + 



n 



2(A fc - 1) 



n 



+ .,,>s, 



0. 



(76) 



According to Lemma [TJ A^ is a continuous function of e, and so as e — > 0, A^ is an eigenvalue of 
Wo- For the particular choice of parameters made above, we have that I — L = S. Since is 
an eigenvalue of L, we have that 1 — is an eigenvalue of / — L = S = I — -L. Thus, in the 



continuous limit as e 
substituting for A^ i 
can be simplified to 



0, we have that A& = 1 — is an eigenvalue of W . Taking e = and 



substituting for Afc in (76), we find that (£j — ^) 2 = 0, which implies that j = k. In this case, (76) 



(A fc -1) A* -1 + 



n 



+ 



2(X k 



n 



n- 



0. 



Solving for the roots of these quadratic polynomials in A& proves the claim. 



(77) 
□ 
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We have already seen that L has an eigenvalue Ai = 0. From Lemma [7j we again find that 
Ai 5 i = 1 — e/n and Ai ; 2 = 1 are eigenvalues of W. If all eigenvalues of L are real, then all eigenvalues 
of W are also real. In this case, we can use the monotonic ordering of eigenvalues to determine an 
upper bound rj on e to ensure that BBGA converges in expectation. We restrict to the case where 
all eigenvalues of L are real. When L has some complex eigenvalues monotonicity is no longer 
preserved making it difficult to determine a reasonable bound. The main result of this section is 
as follows. 

Proposition 5. Consider BBGA updates and suppose that all eigenvalues of L are real. Then 
BBGA converges in expectation when < e < 2n + £^/(2n) — 2£ n > where 6 is the largest eigenvalue 
ofL. 

The proof of Proposition [5] relies on two intermediate lemmas. 

Lemma 8. For BBGA updates, if all eigenvalues of L are real, then there is one stable eigenvalue 
Ai t 2 = 1. All other eigenvalues A& i are strictly decreasing functions of e, and all eigenvalues A^2 
are strictly increasing functions of e. 



Proof. To prove the claim, differentiate (71|) and (|72|) with respect to e: 



dc 



< 




26 + e \ 

26 + e 
V(2& + <0 2 



<0, 



and 



dXk,2 _ 


-A. 




de 


2n 










> 








2n 



26 



26 - e 
V(26 + e) 2 , 



0. 



Note that equality holds in (83) if and only if 6 = 0. 



(78) 

(79) 

(80) 
(81) 

(82) 

(83) 
(84) 
□ 



Lemma 9. For BBGA updates, If all eigenvalues of L are real, then the eigenvalues A& i and Ak,2 
are monotonic decreasing functions o/6- 



Proof. It is clear from (|71|) that A^i is a monotonic decreasing function of 6- F° r ^fc,2 and for 

d\k,2 



fixed e > 0, observe that 



1 



d£k 



+ 



1 



<0, 



(85) 



n ny/^ k /e + 1 

where the inequality follows since 6 — 0- Therefore, A^^ is also a monotonic decreasing function 

of 6- □ 
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Proof of Proposition [5| Under the assumption that the eigenvalues of L are nonnegative real num- 
bers, we have 

!<<y^ + ^<i/(& + !) 2 = ! + £ fe - (86) 



Substituting ( 86 ) into ( 71 ) and ( 72 ) , we get 



l---^< A M <1---^, (87) 
n n n n 

1 - — < A fc , 2 < 1, (88) 
n 

where the equalities hold if and only if k = 1 with corresponding = 0. From these expressions, 
it is clear that Ax 2 = 1 is a simple eigenvalue of W and all other eigenvalues are strictly smaller 
than 1 since < < 2 and e > 0. Furthermore, convergence is guaranteed when all eigenvalues 
are strictly larger than —1. For a given observe that A^i < Xk,2- In addition, from Lemma |9j 
we have that A/^i is a monotonic decreasing function of Therefore, A*. 2 > A&i > A n 1 for all 
k = l,...,n. Thus, we focus on determining conditions under which A nj i > —1. According to 
Lemma [8l A nj i is a strictly decreasing function of e. If there exists r\ such that A ra; i = —1 when 



e = 7/, then A„ 5 i > —1 when e < r\. Solving (71) for A n 1 = —1 we obtain 



j] = 2n + %L - 2C n , (89) 
In 

which completes the proof. □ 

Remark 3. In general, the value of £ n depends on the network topology, and it may not be easy 
to determine a precise value of £ n . A more practical guideline is to take e G (0, ^(n — l) 2 ). To see 



why this is reasonable, differentiate (89) with respect to £, 



-$- = — - 2<--2<0. (90) 
at, n n n 

Therefore, n is a monotonic decreasing function of £ n , and r\ thus satisfies ^(n — l) 2 < r] < 2n since 
< in < 2. If the perturbation parameter e is not larger than ^(n — l) 2 then BBGA is guaranteed 
to converge in expectation. 

Note that, from Remark [3j the upper bound n is not smaller than 1. In the following section we 
investigate what value of e leads to the fastest convergence. We find that we typically seek values 
of e less than 1, and so this upper bound will suffice. 

Although the guidelines derived above are for BBGA, in extensive simulations we have observed 
that the maximal value of e under which UBGA still converges is typically no different than that for 
BBGA for a given graph. Therefore, the guidelines derived above can be also used as approximate 
guidelines for setting the parameters of UBGA. 

8 Optimal Perturbation Parameter 

In the previous section we determined an upper bound on the perturbation parameter e under 
which convergence in expectation is guaranteed. This can be viewed as a sort of stability result. 
In this section we investigate what value of the perturbation parameter leads to the fastest rate of 



convergence. It is well known that the worst-case rate of convergence of systems of the form ( 13 ) 
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is governed by the second largest eigenvalue of W. In the previous section we saw that, for BBGA, 
this second largest eigenvalue is 1 — e/n if the perturbation parameter e > is sufficiently small, 
and this eigenvalue is a monotonic decreasing function of e. At the same time, other eigenvalues 
of W are monotonic increasing, and so it follows that the optimal value of e is the one where the 
modulus of 1 — e/n first coincides with the modulus of another eigenvalue of W. 

Theorem 3. Consider the expected update matrix W corresponding to BBGA, and suppose that 
all eigenvalues of the Laplacian L = diag(-Al) — A are real. For networks with at least three nodes, 
the modulus of the second largest eigenvalue of W is minimized when the perturbation parameter is 
equal to e* = £2/2, where £2 is the second largest eigenvalue of the graph Laplacian L. In this case, 
the second largest eigenvalue ofWisl — £2/(271). When n = 2, the second largest eigenvalue ofW 
is minimized by e* = 2 — y/2. 

Proof. According to Lemma [8j for a fixed e > 0, the eigenvalues of W satisfy 

l- e /n = Ai,i>A2,i>--->A n ,i, (91) 



and 



1 = Ai, 2 > A 2 , 2 > ••• > A„ l2 . (92) 



We also have monotonicity of the respective eigenvalues as a function of e from Lemma [9j Because 
the eigenvalues are continuous functions of e, it follows that there are two points of interest where 
the second largest eigenvalue (in modulus) may switch from being Ai 1 = 1 — e/n. These are the 
points ei where Ai j i(ei) = A2,2(ei) and £2 where A^i^) = — A n> i(e2). To complete the proof we 
can solve for these two values of and then determine that e* = min{ei, £2}. 

Solving Ai ) i(ei) = A2, 2(^2)1 we have e\ = £2/2 and the corresponding eigenvalue of W is 1 — 
£ 2 /(2n). 

To solve Ai i i(e2) = — A nj i(e2), observe that 

3n - £ n - ^n 2 + 2n£ n - ££ 
£ 2 = (93) 

> 3n-£ w - y / (n + £ n ) 2 ^ 

= n - £ n . (95) 

Since < £ n < 2, it must be that e2 > 1 if n > 3, from which we find that €2 > e\ since £2/2 < 1. 
Therefore, the optimal perturbation parameter is e* = e\ = £2/2 when n > 3. If n = 2, then there 
is only one non-zero eigenvalue of the weighted Laplacian matrix L and it is equal to 2. In this 
case, £2 = 2- V2 < 1 = ei, so the optimal perturbation parameter is e* = 2 — y/2. □ 

Remark 4. Note that since the modulus of the second largest eigenvalue of W satisfies |Ai i(e*)| = 
Ai,i(e*) < Ai i i(0) < 1, we see that BBGA is guaranteed to converge in expectation for this setting. 

The above analysis focused on the case where the eigenvalues of L are assumed to be real. In 
extensive simulations, we have observed that this is the case whenever Q is undirected, regardless 
of whether the edge weights are symmetric. For digraphs, the eigenvalues £j of L are generally 
complex numbers, so a monotonicity property such as that obtained in Lemma [9] is no longer 
readily available. Below we analyze the optimal value of the perturbation parameter on random 
digraphs via simulation. We find that e = Re(£2)/2 is a good guideline for directed graphs. 
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Figure 1: An example for undirected graphs with 16 nodes 




Figure 2: Number of broadcasts to converge with respect to e for simulations on the graph shown 
in Fig. [TJ 

8.1 Undirected Graphs 

Consider an undirected graph as illustrated in Figure [T] with 16 nodes distributed uniformly in 
the unit square. Nodes are connected if the Euclidean distance between them is no mo re than 
\J2 logn/n so that the graph Q is connected with probability at least 1 — 1/n 2 
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24 ; this is 



the standard random geometric graph model. For the graph shown in Fig. [TJ the second smallest 
eigenvalue of weighted Laplacian matrix L for BBGA is £2 = 0.5335 so the optimal perturbation 
parameter for BBGA is e* = £ 2 /2 = 0.2668. 

Figure [2] shows the number of broadcasts to achieve consensus as a function of e. Each point 
is an average over 100 trials, and we sweep over values of e from 0.02 to 1 in increments of 0.02. 
The initial values of all nodes are independent and uniformly distributed between and 1. Recall 
the error vector m(t) defined in the proof of Theorem ^1 We declare that consensus is achieved 
at the first iteration t where ||m(i) — m(t — 1)||2 < 10~ . Here we compare four broadcasts gossip 



algorithms. For BBGA we use the weights aj t k as defined in (70). The three versions of UBGA 
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Figure 3: An example for digraphs with 16 nodes. The gray lines denote undirected edges and lines 
with one arrow denote directed edges. 



have different choices of weights a^; they are 

'0.5 if j G Mf7 for UBGA-1, 
U+ ifie^-forUBGA-2, 
h 1/SJ if j G Afj~ for UBGA-3, 
if j ^ AT". 

Observe that the fastest convergence for BBGA occurs near e = 0.26, which matches the value 
predicted by Proposition [3| Also observe that all three versions of UBGA have larger optimal 
perturbation parameter than BBGA. Using any version of UBGA at the optimal value e* for 
BBGA results in suboptimal performance. UBGA-1 exhibits a number advantages over the other 
algorithms: it converges in fewer broadcasts than the other algorithms for suitably chosen e, and 
the curve for UBGA-1 in Fig. [2] is extremely flat near the optimal value, so its performance is 
very robust to the choice of e in this region. From a practical perspective, UBGA-1 is also easy 
to implement in undirected networks since all weights o,- & are constants only depending on the 
network connectivity. For the graph shown in Fig. [TJ the largest eigenvalue of L for BBGA is 
1.3796, and the corresponding upper bound for e is 29.30, which can also be verified by simulation. 



8.2 Strongly Connected Digraphs 

In practical wireless settings, not all links may be symmetric due to differing transmit powers (e.g., if 
the batteries at different nodes have experienced different usage), multipath effects, or interference. 
To simulate directed networks, we begin with a (undirected) random geometric graph and then 
add and delete directed edges by random coin flips (while ensuring that the directed graph remains 
strongly connected. Figure [3] illustrates an example of a strongly connected directed graph. In this 
example, since the directed graph has fewer edges than the corresponding undirected graph shown 
in Fig. [TJ one would expect that more broadcasts are needed to achieve consensus. The real part of 
the second smallest eigenvalue of L for the directed graph in Fig. [3] is 0.3930, so the approximately 
optimal perturbation parameter is e* = 0.1965. 

Fig. [4] illustrates the number of broadcasts with respect to perturbation parameter e required 
to obtain \\m(t) — m(t — 1) 1 1 2 < 10~ 5 . The initial values are independent and uniform over [0, 1]. 
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Figure 4: Number of broadcasts to converge with respect to e for simulations on the directed graph 
shown in Fig. |3j 



The optimal perturbation parameter occurs at 0.20, which matches the predicted value well. From 
this figure, we see that UBGA-1 still gives the best performance 



8.3 Scaling Behavior 

In the previous two subsections, we illustrate the performance of UBGA and BBGA on particular 
directed and undirected graphs. In this subsection, we demonstrate the scaling behavior of these 
two algorithms as the size of the network increases. We compare the performance of three varieties 
of UBGA and two varieties of BBGA. The three varieties of UBGA are those as defined above, 
with weights aj^ given in ([96|), and with e = 0.5. For BBGA, we use the same weights dj^ as given 



in (|TO|| and set e either to 0.5 (BBGA-0.5) or to e* (BBGA-opt). 

Following |6|, we investigate two metrics for error. The UBGA algorithms are guaranteed 
to converge to the average consensus solution, and we measure the mean squared error after t 
iterations, 

r(i) = A||*(i)-ill T x(0)||§. (97) 

Since BBGA and other biased broadcast gossip algorithms do not converge to the average consensus, 
we measure their rate of convergence via the deviation, 

q(t)= 1 -\\x(t)- 1 -ll T x(t)\\l (98) 

which is guaranteed to go to zero. Note that we do not include the companion variables y(t) in 
these calculations, since ultimately the aim is to reach consensus only on the states x(t). 

As above, we declare that consensus is achieved if ||?n(t) — m(t — 1)||2 < 10~ 5 . In the sequel, 
we report the results of numerical simulations on random geometric graphs and random strongly 
connected digraphs simulated using the procedure described above. 



8.3.1 Convergence rate 

The number of transmissions required to achieve consensus for undirected graphs and digraphs 
are shown in Figs. [5] and [6] respectively. Unsurprisingly, BBGA-opt converges significantly faster 
than BBGA-0.5. However UBGA-1 is still the best one in terms of converge rate performance of 
scaling behavior. Since UBGA-1 can achieve the average consensus, we prefer UBGA-1 if out-degree 
information is available; otherwise, BBGA-opt is the winner. 
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Figure 5: Number of broadcasts to converge with respect to N on undirected graphs. 





Figure 7: The standard deviation performance with respect to N on undirected graphs. 



8.3.2 Deviation 

Figures [7] and [8] show the deviation q(t) at the time the algorithm is declared to have converged. 
For this particular initialization scheme (i.i.d. uniform), we note that the BBGA algorithms are 
roughly half an order of magnitude worse than the UBGA schemes. We investigate the effects 
of initialization on deviation further in the next section. For now, we note that both versions of 
BBGA achieve comparable performance in terms of deviation, and likewise, all three versions of 
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Figure 8: The standard deviation performance with respect to N on digraphs. 

UBGA achieve effectively the same deviation at the time they converge. 

In summary, from the experiments reported in this section we conclude that UBGA-1 is the 
most desirable solution if the out-degree information is available (including, when Q is undirected); 
otherwise BBGA-opt is the next most preferable since it gives the fastest rate of convergence. 

9 Performance Analysis 

In this section, we compare the broadcast gossip algorithms proposed in this paper with the previous 
broadcast gossip algorithms of [6] and [7J. In the figures and discussion below, BGA-1 refers to 
the algorithm in |6|, BGA-2 refers to the one in [7], and BBGA and UBGA are the algorithms 
proposed in this paper. The previous section illustrated that UBGA-1 exhibits many advantages, 
both in terms of the choice of perturbation parameter and the rate of convergence, compared to 
the other UBGA algorithms. For this reason, in this section we use UBGA-1 as the representative 
of the UBGAs. For both UBGA and BBGA, we will investigate two settings for the perturbation 
parameter: e = 0.5 and e*. Note that e* is only optimal for BBGA, and it may be suboptimal for 
UBGA. We also remark that the comparison of BGA-1 and BGA-2 with BBGA-opt and UBGA-opt 
(i.e., those using e*) is unfair, since the information used to determine e* is not made available to 
either BGA-1 or BGA-2; in particular, neither of those algorithms uses global topology information 
such as £2- This is our primary motivation for also considering the performance of UBGA and 
BBGA with e = 0.5. 

All simulations in this section use (undirected) random geometric graph topologies with the 
same connectivity radius as in the previous section. When Q is directed, BGA-1 is no longer 
guaranteed to converge to the average consensus in expectation. Unless otherwise noted, each 
result corresponds to the average over 100 Monte Carlo trials. 

Since the initial values effect the performance of various broadcast gossip algorithms, we con- 
sider four approaches to initializing the values Xj(0): 1) independent and uniform over [0,1]; 2) 
independent and Gaussian with zero mean, unit variance; 3) the spike initialization, where one 
random node has an initial value of 1 and all other nodes have initial values 0; and 4) the slope 
initialization, the initial value at node i is the sum of its x- and y-coordinates in the unit square 
(note all graphs are drawn from the ensemble of random geometric graphs in the plane. 
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Figure 9: The deviation of BGA-1, BGA-2, BBGA and UBGA with respect to the number of 
broadcasts on undirected random geometric graphs with uniform distribution for initial values. 




(a) n = 50 (b) n = 100 (c) n = 500 



Figure 10: The deviation of BGA-1, BGA-2, BBGA and UBGA with respect to the number of 
broadcasts on undirected random geometric graphs with Gaussian initial values. 



9.1 Deviation 

show the deviation q(t) = h\\ x (t) ~ ^H^WHl as a function of time for different ini- 
tializations. Note that this indicates how quickly the algorithms converge to a consensus, regardless 
of the value on which consensus is achieved. 

It is clear from these figures that BGA-1 converges to its final value faster than the other 
algorithms. As we will see below, this is because BGA-1 generally achieves a lower accuracy (in 
terms of mean squared error, r(t)) than the other methods. 

The algorithms BGA-2, BBGA, and UBGA all maintain companion variables. Among these 
algorithms we observe that BGA-2 converges slower, in general, than BBGA. Also note that BBGA- 
opt converges significantly faster than BBGA-0.5 when n = 50 or 100, but the performance of the 
two is much closer for larger networks. Somewhat surprisingly, the deviation of UBGA-opt and 
UBGA-0.5 are typically better or comparable to BBGA-opt. This is surprising because it indicates 
that UBGA is converging faster, despite the fact that it is converging to the average consensus. On 
the other hand, BGA-1 converges quickly to a consensus which is not on the average, and BGA-2 
typically converges to the average consensus but more slowly than UBGA. We conclude that UBGA 
strikes a desirable balance between converging quickly while achieving consensus on the average. 



Figures ^ 
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Figure 11: The deviation of BGA-1, BGA-2, BBGA and UBGA with respect to the number of 
broadcasts on undirected random geometric graphs with slope initial values. 




(a) n = 50 (b) n = 100 (c) n = 500 



Figure 12: The deviation of BGA-1, BGA-2, BBGA and UBGA with respect to the number of 
broadcasts on undirected random geometric graphs with spike initial values. 



9.2 Mean Squared Error 



Figs. 1316 show the mean squared error r(t) = -\\x(t) — ^ 1 l r x(0) || | as a function of t for all four 
algorithms and the four different initializations on networks of n = 50, 100, and 500 nodes. UBGA 
generally has the best performance among all algorithms, in the sense that a small deviation 
is achieved with relatively few broadcasts. BGA-1 has a high deviation; it is well-known that 
it converges quickly but that it does not converge to the average consensus. When out-degree 
information is available UBGA is preferable. For networks with n = 50 or 100 nodes, using e = 0.5 
is close enough to optimal that the performance is extremely good for UBGA-0.5. For larger graphs, 
the performance of UBGA-opt dominates that of UBGA-0.5. An interesting open problem is to 
come up with a better practical guideline for setting e as a function of network size and structure, 
e.g., for random geometric graphs. 

It is interesting to note that BBGA has better performance than BGA-2 for a smaller number 
of broadcasts. Since BGA-2 converges to the average consensus in most examples, but BBGA does 
not, this indicates that BGA-2 converges slower than BBGA. For larger networks BBGA may be 
preferable as an alternative which quickly reaches a reasonably accurate solution. 
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Figure 13: The mean squared error of BGA-1, BGA-2, BBGA and UBGA with respect to the 
number of broadcasts on undirected random geometric graphs with uniform initial values. 




Figure 14: The mean squared error of BGA-1, BGA-2, BBGA and UBGA with respect to the 
number of broadcasts on undirected random geometric graphs with Gaussian initial values. 




Number of Broadcasts Number of Broadcasts Number of Broadcasts 



(a) n = 50 (b) n = 100 (c) n = 500 

Figure 15: The mean squared error of BGA-1, BGA-2, BBGA and UBGA with respect to the 
number of broadcasts on undirected random geometric graphs with slope initial values. 



10 Conclusion and future work 

In this paper, we propose a framework for broadcast gossip algorithms and prove that consensus 
is achieved in both expectation and in the mean squared sense for reasonably chosen coefficients. 
Then we analyze two particular broadcast gossip algorithms, UBGA and BBGA, where the for- 
mer preserves the average and the latter is more practical for implementation in non-symmetric 
broadcast networks. These algorithms have an interpretation from the perspective of matrix per- 
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(a) n = 50 (b) n = 100 (c) n = 500 



Figure 16: The mean squared error of BGA-1, BGA-2, BBGA and UBGA with respect to the 
number of broadcasts on undirected random geometric graphs with spike initial values. 

turbation, and we derive an upper bound on the perturbation parameter under which convergence 
is guaranteed. We also study the optimal value of the perturbation parameter and find it is within 
the range of allowable values. By numerical analysis, the optimal perturbation parameter obtained 
from BBGA on undirected digraphs is shown to also work well on digraphs. If the out-degree 
information is available (as is the case in undirected networks), we demonstrate that UBGA out- 
performs the existing state-of-the-art broadcast gossip algorithms. When out-degree information 
is not available, BBGA is a promising alternative because it exhibits an excellent tradeoff between 
the rate of convergence and the limiting mean squared error. 

Interesting future work includes studying convergence properties of broadcast gossip algorithms 
with quantized transmissions. The broadcast gossip algorithms proposed in this paper involve 
maintaining and transmitting companion variables, in addition to the state variables which are 
being averaged, and we are interested in understanding how the number of bits allocated to these 
two different values impacts the rate of convergence and limiting value. 

Finally, since wireless media is shared by nodes within communication radius for each other, 
broadcast packets are likely to undergo collisions and interference, and it would also be interesting 
to develop a deeper understanding of how broadcast gossip algorithms behave under more realistic 
channel models (e.g., accounting for capture effects). 
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